Rollout sampling approximate policy iteration
نویسندگان
چکیده
منابع مشابه
Algorithms and Bounds for Rollout Sampling Approximate Policy Iteration
Several approximate policy iteration schemes without value functions, which focus on policy representation using classifiers and address policy learning as a supervised learning problem, have been proposed recently. Finding good policies with such methods requires not only an appropriate classifier, but also reliable examples of best actions, covering the state space sufficiently. Up to this ti...
متن کاملRollout Sampling Policy Iteration for Decentralized POMDPs
We present decentralized rollout sampling policy iteration (DecRSPI)–a new algorithm for multiagent decision problems formalized as DECPOMDPs. DecRSPI is designed to improve scalability and tackle problems that lack an explicit model. The algorithm uses Monte-Carlo methods to generate a sample of reachable belief states. Then it computes a joint policy for each belief state based on the rollout...
متن کاملAlgorithms and Bounds for Sampling-based Approximate Policy Iteration *
Several approximate policy iteration schemes without value functions, which focus on policy representation using classifiers and address policy learning as a supervised learning problem, have been proposed recently. Finding good policies with such methods requires not only an appropriate classifier, but also reliable examples for the best actions, covering all of the state space. One major ques...
متن کاملApproximate Modified Policy Iteration
Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three implementations of approximate MPI (AMPI) that are exten...
متن کاملRollout Allocation Strategies for Classification-based Policy Iteration
Classification-based policy iteration algorithms are variations of policy iteration that do not use any kind of value function representation. The main idea is 1) to replace the usual value function learning step with rollout estimates of the value function over a finite number of states, called the rollout set, and the actions in the action space, and 2) to cast the policy improvement step as ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Machine Learning
سال: 2008
ISSN: 0885-6125,1573-0565
DOI: 10.1007/s10994-008-5069-3